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ABSTRACT 

The  impact  of  the  number  of  satellite  altimeters  providing  sea  surface  height  anomaly  (SSHA)  information 
for  a  data  assimilation  system  is  evaluated  using  two  comparison  frameworks  and  two  statistical  methodologies. 
The  Naval  Research  Laboratory  (NRL)  Layered  Ocean  Model  (NLOM)  dynamically  interpolates  satellite 
SSHA  track  data  measured  from  space  to  produce  high-resolution  (eddy  resolving)  fields.  The  Modular 
Ocean  Data  Assimilation  System  (MOD AS)  uses  the  NLOM  SSHA  to  produce  synthetic  three-dimensional 
fields  of  temperature  and  salinity  over  the  global  ocean.  A  series  of  case  studies  is  defined  where  NLOM 
assimilates  different  combinations  of  data  streams  from  zero  to  three  altimeters.  The  resulting  NLOM  SSHA 
fields  and  the  MODAS  synthetic  profiles  are  evaluated  relative  to  independently  observed  ocean  temperature 
and  salinity  profiles  for  the  years  2001-03.  The  NLOM  SSHA  values  are  compared  with  the  difference  of  the 
observed  dynamic  height  from  the  climatological  dynamic  height.  The  synthetics  are  compared  with  obser¬ 
vations  using  a  measure  of  therm ocline  depth.  Comparisons  are  done  point  for  point  and  for  1°  radius  regions 
that  are  linearly  fit  over  2-month  periods.  To  evaluate  the  impact  of  data  outliers,  statistical  evaluations  are 
done  with  traditional  Gaussian  statistics  and  also  with  robust  nonparametric  statistics.  Significant  error  re¬ 
duction  is  obtained,  particularly  in  high  SSHA  variability  regions,  by  including  at  least  one  altimeter.  Given 
the  limitation  of  these  methods,  the  overall  differences  between  one  and  three  altimeters  are  significant  only 
in  bias.  Data  outliers  increase  Gaussian  statistical  error  and  error  uncertainty  compared  to  the  same  com¬ 
putations  using  nonparametric  statistical  methods. 


1.  Introduction 

Sea  surface  height  anomaly  (SSHA)  measurements 
from  space  provide  a  global  and  nearly  real-time  repre¬ 
sentation  of  ocean  dynamical  features  and  are  the  most 
important  observation  component  of  the  U.S.  Navy’s 
operational  global  ocean  prediction  system.  Observations 
of  SSHA  are  particularly  useful  for  mapping  eddies, 
meandering  currents,  and  fronts  that  are  associated  with, 
for  example,  planetary  waves  and  geostrophic  currents. 
This  information  is  utilized  by  assimilation  into  the  V32° 
Naval  Research  Laboratory  (NRL)  Layered  Ocean 
Model  (NLOM;  Shriver  et  al.  2007).  NLOM  has  dem¬ 
onstrated  skill  as  a  dynamical  interpolator  of  satellite 
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altimeter  observations  transforming  the  along-track  data 
into  high-horizontal-resolution  (eddy  resolving)  SSHA 
fields  that  accurately  represent  the  mesoscale  variability. 

Because  NLOM  is  a  layered  model  with  high  horizontal 
but  low  vertical  resolution,  it  does  not  independently 
provide  a  realistic  vertical  structure.  A  more  complete 
depiction  is  made  by  using  the  SSHA  fields  from  NLOM 
within  the  Modular  Ocean  Data  Assimilation  System 
(MODAS)  to  construct  subsurface  three-dimensional 
temperature  and  salinity  fields.  These  global  fields,  which 
we  call  synthetics,  are  generated  through  the  use  of 
gridded  statistical  relationships  between  SSHA,  sea  sur¬ 
face  temperature  (SST),  and  historical  in  situ  profile 
observations  of  temperature  and  salinity  (Fox  et  al. 
2002a, b).  MODAS  synthetics  are  generated  for  several 
experimental  cases  that  differ  according  to  the  selection 
of  zero  to  three  satellite  altimeter  data  streams  to  be  as¬ 
similated  by  NLOM.  These  synthetic  fields  of  temperature 
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and  salinity  provide  the  basis  for  the  present  study  and  are 
compared  with  independent  in  situ  ocean  profile  obser¬ 
vations  to  evaluate  their  relative  accuracy. 

A  main  goal  of  this  analysis  is  to  evaluate  the  relative 
prediction  accuracy  resulting  from  different  numbers  and 
configurations  of  altimeters  providing  data  for  NLOM 
and  MODAS.  Another  equally  important  goal  is  to  de¬ 
velop  validation  methodologies  for  error  assessment  of 
ocean  prediction  systems  in  general.  To  accomplish  these 
objectives,  we  apply  two  comparison  frameworks  and  two 
statistical  methodologies.  Comparisons  are  made  point 
for  point  and  in  binned  overlapping  1°  radius  regions 
60  days  long.  Error  analyses  are  then  computed  using 
both  traditional  Gaussian  and  also  robust  nonparam- 
etric  statistical  methods.  The  metric  used  for  the  eval¬ 
uation  is  thermocline  depth  (TD)  because  the  SSHA  is 
most  highly  correlated  with  the  thermocline  (Hurlburt 
1986). 

During  the  analysis  time  period,  there  were  three  sat¬ 
ellite  altimeter  platforms  used  in  daily  operations.  Main¬ 
taining  that  level  of  satelhte  coverage  is  expensive,  and 
thus  a  clear  understanding  regarding  the  prediction  im¬ 
pact  of  multiple  satellites  is  of  interest.  Recent  scientific 
literature  shows  the  importance  of  satellite  altimeters  and 
their  influence  on  ocean  prediction.  Use  of  the  Navy’s 
Coastal  Ocean  Model  (NCOM)  to  predict  SSHA,  in 
assimilative  and  nonassimilative  modes,  was  evaluated 
relative  to  tide  gauges  by  Barron  et  al.  (2004).  Other 
studies  assimilate  altimeter  SSHA  and  surface  drifter  data 
together  (e.g.,  Lin  et  al.  2007;  Fan  et  al.  2004;  Ishikawa 
et  al.  1996).  In  a  study  by  Smedstad  et  al.  (2003),  identical 
twin  numerical  experiments  simulated  the  SSHA  error 
versus  the  number  of  assimilated  satellite  altimeters  in 
a  V6°  version  of  NLOM.  The  present  analysis  provides 
a  similar  error  comparison  but  with  assimilation  of  real 
altimeter  measurements  in  Vyf  NLOM  and  MODAS  and 
in  situ  observations  as  the  validation  reference. 

Numerical  models  can  capture  the  statistical  charac¬ 
teristics  of  the  spatial  and  temporal  variability  relatively 
well.  Climate  studies  often  require  that  the  underlying 
mechanisms  and  general  character  of  the  prediction  is 
correct  (e.g.,  Doney  et  al.  2007),  while  the  specific  skill 
of  predicting  temperature  and  salinity  at  a  point  is  less 
important.  Other  validation  studies  evaluate  the  long¬ 
term  temporal  variability  of  models  (Kara  and  Hurlburt 
2006).  More  detailed  predictions  are  often  required  by 
the  fishing  industry,  the  navy,  and  others  for  prediction 
of  the  synoptic  variability.  For  example,  the  positions  of 
fronts  are  often  used  by  the  maritime  community  to 
evaluate  model  skill  (e.g.,  Oey  et  al.  2005). 

An  even  more  detailed  method  for  comparing  model 
results  is  to  compare  the  model  at  the  exact  location  of 
the  observation  point  for  point.  This  type  of  comparison 


is  a  very  stringent  measure  for  model  evaluation,  be¬ 
cause  the  observations  are  influenced  by  all  the  physical 
processes  present  in  the  ocean,  some  of  which  may  not 
be  adequately  represented  in  the  prediction  system.  For 
example,  turbulent  diffusion  is  parameterized,  and  in¬ 
ternal  waves  and  tides  are  not  included  in  the  prediction 
systems  we  evaluate  here.  Point-for-point  comparisons 
can  exaggerate  the  influence  of  small  displacement  er¬ 
rors  of  mesoscale  features.  If  the  predictions  represent 
the  mesoscale  features  relatively  well  but  with  slight 
shifts  in  the  horizontal,  relatively  large  point-for-point 
errors  may  occur. 

A  second  type  of  comparisons  begins  with  overlapping 
data  bins  with  a  1°  radius  and  60-day  time  window.  The 
data  in  these  bins  are  then  linearly  fit  in  a  least  squares 
sense,  providing  a  statistical  representation  (superposi¬ 
tion  data)  of  thermocline  depth  within  the  bin.  This 
methodology  is  applied  to  produce  both  superobservation 
and  superprediction  data.  The  main  advantages  are  an 
evening  of  the  data  weights  between  high-  and  low- 
density  observation  regions  and  estimates  of  time  rate 
of  change.  Results  using  this  procedure  are  compared 
with  results  from  point-for-point  comparisons  to  eval¬ 
uate  the  impact  of  the  comparison  framework. 

Both  traditional  Gaussian  and  robust  nonparametric 
statistical  methods  are  used  in  the  present  analysis.  Ocean 
observations  rarely  exhibit  Gaussian  data  distributions 
that  are  assumed  when  using  Gaussian  statistical  methods. 
To  evaluate  the  impact  of  non-Gaussian  data  distributions, 
nonparametric  and  Gaussian  statistical  methods  are  ap¬ 
plied  to  all  comparisons  in  this  analysis. 

In  section  2,  the  observations  are  described,  and  spe¬ 
cial  attention  is  paid  to  the  vertical  resolution  because  it 
has  a  large  influence  on  the  TD  computations.  The  dy¬ 
namically  interpolated  SSHA  fields  from  NLOM  are 
described  in  section  3a,  and  sections  3b  and  3c  describe 
MODAS  and  the  details  of  the  synthetic  variants,  re¬ 
spectively.  The  TD  metric  and  superdata  and  statistical 
methods  are  described  in  section  4.  The  results  are 
presented  in  section  5,  with  a  summary  of  conclusions  in 
section  6. 

2.  Profile  observations 

Observed  profiles  of  temperature  and  profile  pairs  of 
temperature  and  salinity  from  the  World  Ocean  Data¬ 
base  2005  (WOD05;  Boyer  et  al.  2006),  the  U.S.  Navy’s 
Master  Oceanographic  Observation  Dataset  (MOODS; 
Teague  et  al.  1990),  and  Argo  (Roemmich  et  al.  2004) 
are  combined.  Because  the  data  sources  for  MOODS 
and  WOD05  are  mostly  the  same,  much  of  these  data 
are  duplicated,  but  each  set  contains  unique  profiles. 
The  WOD05  has  less  stringent  quality-control  (QC) 
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(a) 


102391  observation  profiles  with  T  only 


(b) 


67593  observation  profiles  with  T  and  S 


160  -160  -120 
Longitude 

Fig.  1.  The  locations  of  the  publicly  available  profile  observations  that  have  (a)  temperature  and  (b)  paired  values  of 

temperature  and  salinity. 


procedures  than  MOODS.  Argo  data  are  contained  in 
the  MOODS  and  WOD05  datasets  but  not  up  to  the 
present  time.  In  addition,  the  Argo  profiles  in  MOODS 
are  from  the  real-time  system  and  thus  lack  some  of  the 
QC  procedures  specifically  designed  for  floats  performed 
at  the  Argo  data  acquisition  centers  (DAC).  Also,  further 
QC  recommendations  are  periodically  available,  such  as 
the  pressure  sensor  error  that  was  discovered  for  certain 
float  types  (available  online  at  http://www.argo.ucsd. 
edu).  For  these  reasons,  the  latest  Argo  data  from  the 
DACs  have  been  used  in  place  of  the  same  profiles 
present  in  WOD05  and  MOODS. 

The  data  for  this  study  are  temperature  profiles  from 
2001  through  2003  that  have  the  shallowest  depth  level 
above  12  m,  the  deepest  depth  level  below  150  m,  and 


at  least  temperature  values.  Most  profiles  are  expend¬ 
able  bafhythermographs  (XBTs),  but  many  are  from 
conductivity-temperature-depth  (CTD)  recorders.  The 
total  number  of  profiles  for  this  time  period  is  225  160, 
with  73  549  having  profiles  pairs  of  temperature  T  and 
salinity  S  (Fig.  la)  and  151  611  having  T  only  (Fig.  lb). 
An  additional  restriction  is  that,  in  the  upper  150  m,  no 
profile  may  have  any  gaps  in  depth  levels  exceeding 
25  m.  This  restriction  reduces  the  numbers  to  222  772 
total,  72  586  for  T  and  S,  and  150  186  for  T  only. 

Although  the  data  are  quality  controlled,  errors  in¬ 
cluding  profile  location,  XBT  drop  rate,  and  other  po¬ 
tential  errors  may  still  exist.  The  effect  of  location  errors 
and  how  the  QC  procedures  can  miss  this  type  of  error 
are  discussed  by  Kara  et  al.  (2009). 
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3.  Prediction  system 

a.  NLOM 

NLOM  is  a  global  numerical  primitive  equation  lay¬ 
ered  formulation  model  with  six  dynamic  layers  and  a 
bulk  mixed  layer.  The  operational  eddy-resolving  Vyf 
global  ocean  prediction  system  has  V32°  by  45/1024° 
(latitude  by  longitude)  resolution  and  an  embedded 
mixed  layer  depth  (MLD)  model  independent  from 
the  six  dynamic  layers  (Shriver  et  al.  2007).  The  mixed 
layer  can  extend  across  more  than  one  dynamical  layer 
(Wallcraft  et  al.  2003),  and  the  system  assimilates  both 
satellite  IR  SST  (Barron  and  Kara  2006)  and  SSHA 
observations  (Barron  et  al.  2009). 

The  SSHA  assimilation  scheme  was  developed  for  the 
efficiency  requirements  of  operational  systems  and  uses 
an  incremental  updating  technique  (Smedstad  et  al. 
2003).  First,  an  optimum  interpolation  deviation  analy¬ 
sis  of  SSHA  is  performed  using  a  model  forecast  as  a  first 
guess  and  the  mesoscale  covariance  functions  of  Jacobs 
et  al.  (2001).  The  deviation  analysis  is  used  to  apply 
weights  to  all  observations  according  to  their  space  and 
time  locations,  in  a  3-day  window,  during  which  the 
observations  are  incrementally  updated  in  the  model. 
The  satellite  altimeter  products  used  for  assimilation  are 
from  1)  Jason-1,  2)  European  Remote  Sensing  Satellite-2 
{ERS-2)i Envisat,  and  3)  the  Geosat  Eollow-On  (GEO). 
SST  is  assimilated  through  a  heat  flux  relaxation  scheme 
that  depends  on  a  3-h  e-folding  time  scale  and  the  mixed 
layer  depth.  The  surface  information  is  transferred 
through  all  dynamic  layers  via  the  statistical  inference 
technique  of  Hurlburt  et  al.  (1990).  This  technique  up¬ 
dates  the  pressure  field  in  all  layers  below  the  surface. 

An  important  aspect  of  NLOM  is  its  estimate  of  the 
bottom  pressure  anomaly,  which  is  used  to  split  the  SSH 
signal  into  steric  and  nonsteric  components.  The  SSHA 
used  in  this  study  is  NLOM’s  total  SSH  minus  the  SSH 
proportional  to  the  bottom  pressure  anomaly  minus 
NLOM’s  long-term  mean  steric  SSH  (Barron  et  al. 
2007).  As  a  result,  this  SSHA  is  the  steric  SSH  anomaly 
that  is  required  for  MOD  AS  (see  section  3b). 

A  modified  version  of  the  Vi2°  5-min  gridded  elevations/ 
bathymetry  for  the  world  (ETOP05)  bottom  topogra¬ 
phy  (National  Oceanic  and  Atmospheric  Administration 
1986)  is  used  in  the  NLOM  simulations.  The  topography 
is  first  interpolated  to  the  model  grid  and  then  smoothed 
twice  with  a  nine-point  smoother  to  reduce  energy  gen¬ 
eration  at  smaller  scales  that  are  poorly  resolved  by  the 
model.  The  maximum  depth  of  the  model  is  set  at  6500  m. 
The  minimum  depth,  set  at  200  m,  is  used  as  the  model 
boundary,  with  a  few  exceptions  where  shallower 
depths  are  needed  to  connect  semienclosed  seas.  Shelf 
regions  are  excluded,  because  the  NLOM  formulation 


requires  all  layers  to  exist  with  positive  thickness  at  all 
points. 

Although  assimilation  of  SSHA  has  an  effect  on 
NLOM’s  embedded  MLD  through  the  divergence  of 
near-surface  layer  currents,  its  influence  is  more  strongly 
seen  in  the  TD.  The  influence  of  SSHA  on  MLD  is 
indirect  through  the  prognostic  equation  for  MLD 
(Wallcraft  et  al.  2003), 

d(h  ) 

-f  V-(/i  v,)  =  w  ,  (1) 

where  is  the  thickness  of  the  embedded  mixed  layer, 
(Om  is  the  entrainment  rate  at  the  base  of  the  mixed  layer, 
and  Vj  is  the  velocity  of  NLOM’s  top  layer.  Assimilated 
SSHA  is  used  to  update  layer  thicknesses  (which  are 
related  to  the  pressure  gradient)  and  a  geostrophic  cor¬ 
rection  is  applied  to  currents  outside  the  near-equatorial 
band  (approximately  5°N-5°S)  to  rebalance  the  currents 
with  respect  to  the  new  model  SSHA.  Because  diver¬ 
gence  is  a  second-order  computation,  we  do  not  use  the 
NLOM  embedded  MLD  in  the  analysis  metrics.  Instead, 
we  compare  the  dynamic  height  deviations  of  in  situ 
profile  observations  from  climatology  with  NLOM’s  steric 
SSHA.  We  also  use  NLOM’s  SSHA  within  MOD  AS  to 
generate  three-dimensional  synthetic  temperature  and 
salinity  fields  for  direct  comparison  with  in  situ  profile 
observations. 

A  set  of  simulations  using  NLOM  were  run  for  years 
2001-03,  differing  only  in  the  number  of  altimeter  data 
streams  assimilated,  from  0  to  3.  Each  altimeter  data 
stream  provides,  in  one  day,  approximately  35  000  ob¬ 
servations  globally.  An  additional  three-altimeter  sim¬ 
ulation  was  started  from  a  different  initial  condition 
(same  day,  different  climatological  year).  Both  three- 
altimeter  cases  produce  similar  results,  except  in  areas 
where  the  variability  is  nondeterministic. 

b.  MODAS 

MOD  AS  estimates  profiles  of  temperature  and  salinity 
from  inputs  of  SST  and/or  SSHA  using  gridded  poly¬ 
nomials  at  standard  depths  with  coefficients  determined 
by  least  squares  fit  to  historical  observations  (Fox  et  al. 
2002a,b).  The  SSHA  input  for  the  greatest  MODAS  ac¬ 
curacy  is  the  offset  from  the  steric  height  anomaly  relative 
to  1000  m  in  the  background  climatology.  If  SST  and/or 
SSHA  are  not  available,  the  profile  estimates  revert  to 
the  MODAS  bimonthly  climatology.  Both  the  MODAS 
background  climatology  and  regression  coefficients  are 
defined  in  a  database  centered  on  15  January,  15  March, 
15  May,  15  July,  15  September,  and  15  November.  Syn¬ 
thetics  for  a  given  day  are  based  on  coefficients  linearly 
interpolated  in  time  and  space  to  the  desired  location. 
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Table  1.  The  SSHA  estimates  and  other  details  used  in  the  generation  of  MOD  AS  subsurface  synthetics.  All  cases  but  CLIM  use  SST 
from  the  observation  profile.  The  labels  are  used  in  the  figures  and  text  to  denote  each  case. 


No.  Label 


Description 


1 

2 

3 

4 


5 

6 

7 

8 
9 

10 


CLIM  The  MOD  AS  background  climatology  without  SSHA  or  SST  input  (the  only  case  without  the  observed  SST). 

TONLY  The  MOD  AS  synthetic  without  SSHA  input. 

MOD  AS  The  two-dimensional  optimally  interpolated  SSHA  fields  produced  using  all  three  available  altimeters  by 

MODAS  independently  of  NLOM. 

BEST  The  difference  of  the  dynamic  height  computed  from  the  observation  profile  and  the  dynamic  height  of  the 

MODAS  background  climatology.  This  is  the  SSHA  value  that  corresponds  to  MODAS’s  best  possible 
representation  of  the  observation.  This  is  unavailable  in  practical  applications  and  represents  the  upper  limit 
on  prediction  accuracy. 

AO  NLOM  SSHA  produced  without  SSHA  assimilation.  Surface  fluxes  and  SST  assimilation  is  still  used. 

All  The  NLOM  SSHA  assimilation  is  from  Jason-1  only. 

AIG  The  NLOM  SSHA  assimilation  is  from  GFO  only. 

A2EG  The  NLOM  SSHA  assimilation  is  from  ERS-21  Envisat  and  GEO. 

A3EGT  The  NLOM  SSHA  assimilation  from  all  available  altimeters:  Jason-1.  ERS-21 Envisat,  and  GFO. 

A3EGT2  As  in  A3EGT,  but  the  initial  conditions  are  perturbed  slightly  to  help  identify  the  nondeterministic  response 
to  the  initial  conditions. 


Climatological  profiles  are  linearly  interpolated  in  space 
and  time  as  well. 

For  this  analysis,  we  provide  SSHA  and  SST  to  MODAS 
to  produce  synthetic  profiles  of  temperature  and  salinity 
from  which  we  compute  the  thermocline  depth  param¬ 
eters  described  in  section  4a.  To  test  the  impact  of  as¬ 
similating  altimeter  data,  we  pair  SST  from  the  observed 
ocean  profiles  with  SSHA  from  a  series  of  NLOM  ex¬ 
periments.  In  addition,  climatology  profiles  are  defined 
by  a  zero  SSHA  and  climatological  SST. 

c.  Synthetics 

A  set  of  synthetic  profiles  is  derived  using  MODAS  at 
each  of  the  observed  profile  locations  and  times  de¬ 
scribed  in  section  2.  All  of  the  synthetics  use  the  shal¬ 
lowest  temperature  value  from  the  observed  profile  as 
the  SST  input  for  MODAS;  the  climatology  profiles  use 
climatological  SST  and  zero  SSHA.  Eight  different 
SSHA  estimates  are  used  to  create  different  sets  of 
synthetics.  The  descriptions  and  labels  for  each  are  listed 
in  Table  1.  For  this  analysis,  we  focus  on  the  relative 
impact  that  the  number  of  satellite  altimeters  have  on 
the  combined  NLOM  and  MODAS  predictions  of 
SSHA  and  thermocline  depth. 

4.  Methods 

a.  Metrics 

Two  TD  parameters  computed  from  temperature  pro¬ 
file  observations  and  synthetics  made  during  2001-03 
are  listed  in  Table  2.  The  parameters  are  based  on  two 
isothermal  layer  depth  (ILD)  algorithms  computed  us¬ 
ing  freely  available  software  described  in  Lorbacher 
et  al.  (2006)  and  Kara  et  al.  (2000).  These  methodologies 


can  also  be  applied  to  density  profiles  to  compute  MLD, 
but  this  is  not  considered  here  because  not  all  profiles 
have  both  temperature  and  salinity  and  we  are  primarily 
considering  thermocline  depth. 

The  ILD  algorithms  have  been  well  tested,  and  their 
characteristics  are  known.  For  example,  the  Lorbacher 
et  al.  (2006)  ILD  selects  the  first  curvature  peak  of  the 
temperature  profile  and  tends  to  be  relatively  shallow. 
Because  the  Lorbacher  et  al.  (2006)  ILD  method  is  cur¬ 
vature  based,  the  label  we  use  is  ILDy,  where  the  sub¬ 
script  V  represents  the  gradient  associated  with  curvature. 
The  Kara  et  al.  (2000)  ILD  method  selects  the  depth 
where  a  change  in  temperature  relative  to  the  near  sur¬ 
face  exceeds  a  threshold.  Because  the  Kara  et  al.  (2000) 
ILD  is  based  on  a  change  in  temperature,  the  label  we 
use  is  ILDa,  where  the  subscript  A  indicates  that  it  is 
a  threshold  methodology.  Hereafter,  the  unadorned  ILD 
will  refer  to  the  isothermal  layer  depth  in  general.  The 
ILDa  values  tend  to  be  deeper  than  ILDv  and  are  most 
closely  associated  with  the  seasonal  ILD  (e.g.,  Helber 
et  al.  2008). 


Table  2.  The  upper  thermocline  metric  parameters  that  are 
computed  from  observations  and  synthetics. 


No. 

Label 

Description 

1 

ILDi 

Temperature  threshold  ILD  Kara  et  al.  (2000) 

2 

ILDv 

Temperature  profile  curvature  ILD  Lorbacher 
et  al.  (2006) 

3 

TDa 

TD  below  ILDa 

4 

TDv 

TD  below  ILDv 

5 

TSa 

TS,  (Ar/2)/(TDA  -  ILDa):  AT  is  from  the 

TDa  computation 

6 

TSv 

TS,  (Ar/2)/(TDv  -  ILDv):  AT  is  from  the 

TDv  computation 
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Fig.  2.  The  computation  of  TD  for  a  typical  temperature  profile  with  a  relatively  (a)  strong  and  (b)  weak  ther- 
mocline.  The  temperatures  at  the  ILD,  TD,  and  ILD  +  100  m  are  Tq,  -  ATO,  and  Tq  —  AT,  respectively.  The 
thermocline  gradient  is  computed  two  ways,  as  labeled.  The  ILD  is  computed  using  the  threshold  method  (ILD a)  and 
has  a  value  of  (a)  49.5  and  (b)  48.6  m  and  the  corresponding  TD^  is  (a)  56.5  and  (b)  68.6  m. 


For  a  small  percentage  of  profiles,  each  ILD  method 
has  certain  shortcomings.  For  example,  for  profiles  where 
the  water  column  is  mixed  all  the  way  to  the  bottom  of  the 
profile,  ILDv  returns  a  value  of  zero.  This  is  because  the 
algorithm  cannot  find  a  curvature  peak  (Kara  et  al.  2009). 
For  similar  cases,  particularly  at  high  latitudes,  the 
threshold  value  for  ILDa  is  too  large  and  returns  an  ILD 
that  is  unrealistically  deep  or  at  the  bottom  of  the  profile. 
To  eliminate  this  problem,  profiles  are  discarded  if  the 
ILD  estimates  are  at  the  bottom  of  the  profile  and  more 
than  50  m  from  the  ocean  bottom  (based  on  the  navy’s 
2-min  bathymetry)  because  in  these  cases  the  observa¬ 
tions  do  not  allow  for  an  accurate  estimate  of  the  ILD. 

The  TD  parameter  is  computed  by  first  determining 
two  temperatures:  1)  the  temperature  of  the  profile  at 
the  ILD  Tq  and  2)  the  change  in  temperature  between  Tq 
and  the  temperature  100  m  below  the  ILD,  AT.  The 
depth  below  the  ILD  where  the  temperature  is  equal  to 
Tq  —  /AT/2  defines  TD.  Because  we  have  two  estimates  of 
the  ILD,  there  are  two  corresponding  estimates  of  TD, 
TDa  and  TDy,  which  correspond  to  ILDa  and  ILDy, 
respectively. 

The  TDa  is  shown  graphically  with  labels  in  Figs.  2a,b. 
For  profiles  with  sharp  thermoclines  just  below  the 
ILDa,  the  TDa  is  relatively  close  to  the  ILDa  (Fig-  2a). 
For  weak  thermoclines,  the  TDa  is  further  from  the 
ILDa  (Fig.  2b).  In  this  way,  the  TDa  characterizes  the 


location  of  the  highest  gradients  of  the  upper  thermo¬ 
cline  from  a  simple,  widely  applicable  calculation.  Be¬ 
cause  the  computation  is  a  bulk  estimate  of  the  high 
temperature  gradient  depth,  it  is  insensitive  to  random 
noise  in  the  profile.  Finite  difference  computation  of 
vertical  gradients  in  observation  profiles  is  inherently 
noisy  and  thus  avoided  in  this  analysis.  The  strength  of 
the  thermocline  (TS)  is  the  gradient  of  the  upper  ther¬ 
mocline  between  the  ILD  and  TD  (Table  2;  Fig.  2). 

b.  Superposition 

As  mentioned  in  the  introduction,  observation  time 
and  locations  are  grouped  into  analysis  windows  to  pro¬ 
duce  superposition  data  (“superdata”).  For  this  analysis, 
the  windows  are  1°  radius  circular  regions  60  days  long 
computed  each  month  for  the  entire  analysis  time  period 
from  January  2001  through  December  2003.  The  1°  radius 
circles  are  centered  on  even-numbered  latitude  and  lon¬ 
gitude  locations.  For  consistent  comparison,  the  pre¬ 
dictions  are  sampled  or  interpolated  to  the  time  and 
three-dimensional  locations  of  the  observations  and  are 
binned  in  the  same  manner. 

In  the  analysis  windows,  superdata  are  defined  as 
linear  least  squares  fits  to  the  model  or  observation  data. 
The  superdata  values  at  the  location  and  time  of  the 
analysis  window  (x,  y,  tq)  are  the  mean  Uq  and  temporal 
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trend  ai.  The  linear  fit  minimizes  the  squared  residuals  b 
over  all  the  data  within  the  analysis  window,  where  the 
residual  Bj  for  the  jth  data  point,  data,,  within  the  anal¬ 
ysis  window  is  defined  as 

Bj  =  flg(x,  y,  Tq)  +  y,  t^Xt.  -  t^)  -  data^,  (2) 

where  tj  is  the  time  vector  in  units  of  days  and  tq  is  the 
value  at  the  start  of  the  60-day  period  such  that  oq  is  the 
mean  and  Aj  is  the  rate  of  change  per  day.  The  residual, 
B,  is  minimized  in  a  linear  least  squares  sense  and  the 
mean  fit  error  is  computed  as 


where  N  is  the  number  of  data  points  in  the  analysis 
window.  These  fit  parameters,  Oq,  a^,  and  e,  are  computed 
to  produce  the  superobservations  or  superpredictions 
only  for  bins  where  the  data  span  at  least  30  of  60  days. 

The  superdata  methods  reduce  the  weight  of  individual 
data  points  in  regions  of  the  ocean  with  relatively  high 
observation  density.  This  is  because  the  subsequent  sta¬ 
tistical  analysis  treats  every  superdata  point  the  same.  As 
a  result,  regions  with  a  relatively  large  number  of  points 
have  the  same  weight  as  regions  with  a  small  number  of 
points.  Both  superdata  and  point-for-point  results  are 
discussed  in  section  5. 

c.  Statistical  error  analysis  methods 

Standard  data  quality-control  procedures  are  not  per¬ 
fect  and  allow  some  erroneous  data  through.  In  an  at¬ 
tempt  to  reduce  the  impact  of  this  noise,  we  employ 
robust  nonparametric  statistical  methods,  because  these 
methods  are  less  sensitive  to  the  influence  of  outlier  data 
values  (e.g.,  Rousseeuw  and  Leroy  1987).  Because  the 
nonparametric  methods  have  analogous  counterparts 
in  traditional  Gaussian  statistics,  we  list  those  first  and 
explain  the  robust  methods  second. 

For  assessment  of  prediction  accuracy,  we  compute 
the  mean  bias  (MB)  as 

N 


and  correlation  coefficient  R,  as 


R  = 


N 


-  PXOj  -  o) 

i 

1/2 

1/2 

(6) 


where  N  is  the  number  of  samples.  For  the  superdata 
comparison  framework,  N  would  be  the  number  of  anal¬ 
ysis  windows;  for  the  point-for-point  comparison  frame¬ 
work,  A  would  be  the  number  of  actual  observation  points. 
The  corresponding  Pj  and  Oj  are  the  NLOM  SSHA  or 
MODAS  synthetic  predictions  and  the  observations,  re¬ 
spectively,  in  either  point-for-point  or  superdata  form.  The 
sign  convention  of  bias  is  Pj  —  Oj,  such  that  positive  bias 
corresponds  to  synthetics  larger  than  the  observations. 
Mean  values  are  denoted  with  a  bar,  P  and  O,  and  are 
computed  as 


N 


and 


N 


In  addition,  the  unbiased  RMSE  is  also  computed  such 


that 

N 

(7) 

and 

RMSE 

=  Jmb2  +  rmse2„. 

(8) 

Equations  (4)-(8)  are  the  standard  formulation  for 
computing  statistics  on  variables  with  Gaussian  distribu¬ 
tions.  In  practice,  oceanographic  data  are  rarely  Gaussian 
and  using  the  standard  computations  is  often  misleading. 
The  robust  or  nonparametric  methods  used  in  this  anal¬ 
ysis  are  the  biweight  mean  and  standard  deviation  de¬ 
scribed  by  Lanzante  (1996)  and  used  by  others  (e.g.,  Zou 
and  Zeng  2006).  With  these  methods,  the  analogous 
quantity  for  mean  bias  is 

MBbw  =  (F.  -  O for  ,/  =  l,2,3,  ...  ,A,  (9) 


the  root-mean-square  error  (RMSE)  as 


and  for  root-mean-square  error  is 


RMSE  = 


(5) 


=  iP. 


for  7  =  1, 2, 3, 


,N, 

(10) 
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where  (  represents  the  hi  weight  mean  operator  and 
((  represents  the  hi  weight  standard  deviation  op¬ 
erator  (see  the  appendix  for  details).  Because  the  bi- 
weighted  standard  deviation  is  an  unbiased  estimate  of 
the  variability  scale,  Eq.  (8)  is  used  to  provide  the  bi- 
weighted  estimate  of  RMSE, 


RMSE”*  =  (11) 


The  biweight  methods  are  robust  in  that  they  are 
nonparametric,  meaning  that  the  data  distribution  needs 
not  resemble  an  assumed  form.  The  biweight  method¬ 
ology  applies  less  weight  to  data  points  that  are  statis¬ 
tically  considered  outliers  of  the  natural  distribution  of 
the  data.  The  details  of  the  biweight  computations  are 
described  in  the  appendix.  To  evaluate  the  impact  of  the 
robust  error  statistics,  the  results  from  both  Gaussian 
and  nonparametric  statistical  methods  are  presented  in 
section  5. 

For  consistent  comparisons  of  data  with  different  er¬ 
ror  variance  levels,  we  use  “summary”  diagrams  (Jolliff 
et  al.  2009).  The  advantage  of  these  diagrams  is  that 
many  experimental  results  can  be  presented  on  one 
figure  for  quick  evaluation  of  the  complete  error  statis¬ 
tics.  The  quantities  are  scaled  by  the  standard  deviation 
of  the  observations  so  that  the  magnitudes  of  values  on 
a  summary  diagram  tend  to  be  less  than  one.  A  value 
of  one  indicates  that  the  errors  are  as  large  as  the 
observation  standard  deviation.  The  y  axis  indicates 
the  scaled  MB, 


MB'  =  . 


MB 


n  1/2  ’ 


(12) 


or  the  biweight  version, 


mb'”"' 


mb'’* 


(13) 


and  the  x  axis  denotes  the  scaled  unbiased  RMSE, 

RMSE., 


RMSE'  . .  .  =  - 

unbiased 


“'unbiased 


-ll/2  ’ 


(14) 


or  the  biweight  version, 


RMSE 


bw' 

unbiased 


RMSE 


bw 

unbiased 


{{O0 


bw 


(15) 


To  investigate  the  impact  of  outliers,  results  from  both 
the  Gaussian  statistics  [Eqs.  (12)  and  (14)]  and  the  non¬ 
parametric  statistics  [Eqs.  (13)  and  (15)]  are  presented  in 
summary  diagrams  and  discussed  in  section  5. 

The  scaling  is  important  because  it  facilitates  compar¬ 
ison  of  different  datasets  with  varying  variance  levels. 
This  is  the  case  in  this  experiment  becanse  TDa  and  TDy 
have  different  variance  levels  by  virtue  of  differences  in 
the  ILD  methodology.  The  summary  diagrams  allow 
consistent  comparison  of  these  side  by  side. 

Statistical  significance  is  computed  with  the  use  of 
bootstrap  standard  error  estimates  (Efron  and  Tibishirani 
1986).  The  error  bars  presented  in  the  figures  of  section  5 
represent  a  100  independent  draw  bootstrap  standard 
error  estimate. 


5.  Results 

a.  Global  consistency 

Inspection  of  the  model  results  relative  to  observa¬ 
tions  reveals  a  remarkable  similarity  that  bolsters  con¬ 
fidence  in  the  U.S.  Navy’s  ocean  prediction  system.  For 
example.  Figs.  3a,b  show  the  mean  SSHA  values  (su¬ 
perdata  fit  parameter  aq)  for  the  global  observations  and 
the  A3EGT  SSHA  (the  SSHA  used  in  the  A3EGT 
synthetics  described  in  Table  1),  respectively.  Each  dot 
represents  regions  that  have  enough  data  to  span  30  days 
of  the  60-day  bins.  The  parameter  aq  represents  the 
mean  of  1°  diameter  circles  centered  on  even  latitude 
and  longitudes  for  a  60-day  period  starting  on  1  January 
for  the  years  2001-03.  In  this  format,  comparing  the  two 
fields,  the  spatial  variability  appears  similar  and  the 
magnitude  is  close.  Similarly,  TDa  from  observations  and 
the  A3EGT  synthetics  are  shown  in  Figs.  4a,b.  Compar¬ 
isons  with  ILD  climatologies  (available  online  at  http:// 
www7320.nrlssc.navy.mil/nmld/nmld.html;  Kara  et  al. 
2003)  are  consistent,  though  the  TD^  is  expected  to  be 
deeper. 


b.  SSHA  accuracy 

To  gain  an  understanding  of  the  relative  accuracy  of 
the  NLOM  SSHA  estimates  before  using  them  to  con¬ 
struct  synthetics  with  MODAS,  we  compare  them  to 
the  BEST  SSHA  estimate  (the  SSHA  used  for  the  BEST 
synthetics  described  in  Table  1).  This  estimate  is  the 
closest  we  have  to  direct  in  situ  observations  of  SSHA. 
As  mentioned  in  section  4c,  consistent  comparisons  can 
be  made  with  summary  diagrams  and,  consequently, 
all  SSHA  errors  (relative  to  the  BEST  SSHA)  using 
both  nonparametric  and  Gaussian  statistical  methods 
are  presented  in  Fig.  5.  Using  the  superdata  comparison 
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Fig.  3.  The  fit  parameter  (superdata  mean;  in  m)  for  the  (a)  BEST  and  (b)  A3EGT  SSHA  estimates  for  the 
global  ocean  for  a  60-day  analysis  window  starting  1  Jan  for  the  years  2001-03.  Values  exist  only  where  data  span  at 
least  30  of  the  60  days. 


framework  (Fig.  5a),  we  find  that  the  SSFtA  estimates 
using  two  or  three  altimeters  have  the  smallest  bias, 
whereas  the  MOD  AS  SSFtA  estimate  has  the  smallest 
RMSE.  In  Fig.  5  and  the  rest  of  the  summary  diagrams, 
the  greater  accuracy  is  indicated  by  values  closer  to  the 
top-left  corner  of  the  figures.  Farther  toward  the  left 
represents  smaller  RMSE,  and  farther  toward  the  top 
represents  smaller  bias.  Bias  is  decreasing  upward,  be¬ 
cause  in  the  cases  considered  here  it  is  always  negative. 
MOD  AS  tends  to  have  a  shallow  bias  in  MED  and  TD 
because  of  the  unrealistically  smooth  vertical  gradients 
of  the  synthetics. 


The  Gaussian  statistical  methods  produce  similar  re¬ 
sults  to  those  for  the  nonparametric  methods,  with  the 
exception  of  the  AO  case  (Fig.  5a),  which  has  larger 
RMSE.  For  the  traditional  Gaussian  statistical  results, 
the  bootstrap  standard  error  bars  are  unstable  and  sen¬ 
sitive  to  the  number  of  independent  draws.  For  this 
reason,  some  of  these  error  bars  extend  outside  the  axis 
limits  in  Fig.  5.  The  point-for-point  comparison  frame¬ 
work  has  a  reduction  in  bias  error  with  an  increase  in 
RMSE  error  only  for  Gaussian  statistics  (Fig.  5b). 

Because  standard  error  bars  for  the  one-  and  two- 
altimeter  cases  do  not  overlap,  two  altimeters  are 
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Fig.  4.  The  fit  parameter  oq  (superdata  mean;  in  m)  for  the  (a)  observed  and  (b)  A3EGT  synthetic  TDa  estimate  for 
the  global  ocean  for  a  60-day  analysis  window  starting  1  Jan  for  the  years  2001-03. 


significantly  more  accurate  than  just  one.  One  altimeter 
is  also  clearly  better  than  the  nonassimilative  NLOM 
case.  It  is  interesting  to  note  that  the  AIG  has  smaller 
errors  than  the  All,  suggesting  the  higher  horizontal 
resolution  of  the  GFO  orbit  may  reduce  the  overall  bias. 
This  is  consistent  with  the  expected  error  estimates  of 
Jacobs  et  al.  (2002). 

c.  TD  accuracy 

The  primary  metric  for  this  analysis  is  the  TD,  which  is 
computed  from  the  MOD  AS  synthetics.  An  advantage 
of  this  metric  is  that  we  are  able  to  compute  TD  from 


both  the  observation  and  the  synthetic  profiles,  whereas 
for  SSHA  there  is  no  direct  in  situ  observation.  In  this 
section,  summary  diagrams  for  all  synthetics  for  both 
TD  metrics  are  presented  in  Figs.  6a-d.  In  these  figures, 
the  lowest  error  is  again  closest  to  the  top-left  corner  of 
the  plot,  because  there  is  a  shallow  TD  bias  (MB '  <  0) 
for  all  synthetics  and  RMSE'  >  0.  The  error  bars  are 
from  a  100  independent  draw  bootstrap  standard  error 
estimate.  Figure  6a  uses  both  the  superdata  and  non- 
parametric  methods  and  therefore  should  be  the  most 
reliable  global  error  estimate.  Figure  fib  also  uses  the  su¬ 
perdata  methods,  but  with  Gaussian  statistics.  Figures  6c, d 
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3  SSHA  error,  super-data  b  SSHA  error,  point-for-point 


Fig.  5.  The  (a)  superdata  and  (b)  point-for-point  summary  diagram  of  biweight  statistics  RMSE|J"j,jj|j,.j  (x  axis)  and  MB*’"  (y  axis,  black; 
25%  gray  error  bars)  and  Guassian  statistics  (x  axis)  and  MB'  (y  axis,  50%  gray;  75%  gray  error  bars)  for  SSHA  for  the  six 

NLOM  and  MOD  AS  SSHA  estimates  denoted  by  the  symbols  in  the  legend.  The  SSHA  values  are  those  associated  with  the  synthetics 
described  in  Table  1.  The  error  bars  represent  a  100  independent  draw  bootstrap  standard  error  estimate. 


are  point-for-point  comparisons  with  nonparametric 
and  Gaussian  statistics,  respectively. 

The  statistical  methods  have  the  largest  impact  on  the 
results  (cf.  Fig.  6a  with  6b  and  Fig.  6c  with  6d).  Note  that 
the  axis  limits  are  larger  in  Figs.  6b, d.  Outliers  increase 
the  errors  when  using  Gaussian  statistical  techniques 
because  the  nonparametric  statistics  reduce  the  weight 
of  points  that  exist  far  from  the  median  absolute  de¬ 
viation  (see  appendix),  whereas  the  impact  of  the  su¬ 
perdata  versus  point-for-point  comparisons  is  relatively 
small  (cf.  Fig.  6a  with  6c;  cf.  Fig.  6b  with  6d).  There  is, 
however,  less  separation  between  the  TDa  and  TDy 
metrics  for  the  point-for-point  framework.  There  is  also 
a  slight  increase  in  error  levels  for  the  point-for-point 
framework,  suggesting  that  there  are  larger  errors  in 
regions  of  high  data  density. 

Notice  that  the  synthetics  are  more  skillful  for  the 
threshold  method  TDa,  and  the  gradient  method  TDy 
has  the  larger  error  for  all  cases.  This  occurs  because 
MOD  AS  synthetic  profiles  tend  to  have  overly  smooth 
vertical  gradients  that  result  in  a  shallow  bias  for  ILDa 
and  therefore  TDa.  The  shallow  bias  is  exacerbated  by 
the  curvature-based  ILDy  algorithm  when  operated  on 
synthetics  because  of  the  reduced  curvature  relative  to 
the  observed  profiles. 

The  synthetic  with  the  smallest  overall  error  for  both 
metrics  using  nonparametric  statistics  is  BEST  (Figs.  6a,c), 
which  shows  the  optimal  performance  of  the  MOD  AS 
approached  by  using  as  input  the  actual  dynamic  height 
deviation  from  the  observed  profiles  (see  Table  1).  The 
smallest  bias  is  for  CLIM  and  TONLY,  which  have  nearly 


identical  errors.  The  next  smallest  error  is  with  the  three- 
altimeter  synthetics  A3EGT  and  A3EGT2.  The  largest 
error,  in  MB  and  RMSE,  tends  to  occur  with  the  zero- 
altimeter  synthetic  case  AO.  In  terms  of  RMSE,  all  syn¬ 
thetics,  except  BEST  and  AO,  tend  to  have  nearly  the  same 
RMSE.  The  inclusion  of  altimeter  assimilation  tends  to 
have  the  largest  impact  on  the  MB. 

The  results  using  Gaussian  statistics  are  less  clear  be¬ 
cause  the  standard  error  bars  are  larger  and  the  estimates 
are  clustered  closer  together  (Figs.  6b,d).  The  general 
trend  is  maintained,  where  fewer  altimeters  have  larger 
errors,  but  the  separations  between  values  are  generally 
not  significant. 

The  authors  have  not  found  a  suitable  nonparametric 
estimate  for  the  correlation  coefficient.  The  traditional 
estimate  computed  using  Eq.  (6)  for  both  comparison 
frameworks  is  shown  in  Fig.  7.  Although  all  correlations 
are  high,  the  differences  between  the  cases  are  not  sig¬ 
nificant,  with  the  exception  of  the  difference  between  the 
AO  and  the  other  altimeter  assimilative  cases.  A  major 
difference  in  correlation  relative  to  the  results  shown  in 
Fig.  6  is  the  low  correlation  values  of  CLIM  and  TONLY. 
This  is  consistent  because  climatologies  do  not  vary  on 
subseasonal  time  scales;  therefore,  we  would  expect  lower 
correlations.  Climatologies  are  designed  to  represent  long 
space  and  time  scale  averages  such  as  those  computed 
for  Fig.  6. 

The  overall  results  show  that  a  significant  error  re¬ 
duction  is  obtained  by  including  at  least  one  altimeter. 
The  difference  between  one  and  three  altimeters  is  sig¬ 
nificant  only  in  bias.  Given  the  limitations  of  the  time 
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Fig.  6.  (a)  The  superdata  and  nonparametric  statistics  summary  diagram  of  (x  axis)  and  MB*’"  (y  axis)  for  TDa  (black  and 

75%  gray  error  bars)  and  TD^  (50%  and  25%  gray  error  bars)  for  the  10  synthetics  denoted  by  the  symbols  in  the  legend  as  described  in 
Table  1.  (b)  The  superdata  and  Gaussian  statistics  summary  diagram  of  {x  axis)  and  MB'  (y  axis)  for  TDa  (black  and  75% 

gray  error  bars)  and  TD^  (50%  and  25%  gray  error  bars),  (c)  The  point-for-point  data  and  nonparametric  statistics  summary  diagram  of 
RMSEJJJJ^jjij^j  (xaxis)  and  MB*’"  (y  axis)  for  TDa  (black  and  75%  gray  error  bars)  andTDv  (50%  and  25%  gray  error  bars),  (d)  The  point- 
for-point  data  and  Gaussian  statistics  summary  diagram  of  (x  axis)  and  MB'  (y  axis)  for  TDa  (black  and  75%  gray  error 

bars)  and  TDv  (50%  and  25%  gray  error  bars).  The  error  bars  represent  a  100  independent  draw  bootstrap  standard  error  estimate. 


and  space  scales  inherent  in  the  NLOM/MODAS  as¬ 
similation,  we  are  unable  to  discern  a  significant  differ¬ 
ence  in  the  errors  using  two  or  three  altimeters.  The  use 
of  nonparametric  statistics  tends  be  more  effective  in 
discriminating  differences  among  the  MOD  AS  SSHA- 
derived  predictions. 

d.  Error  variability 

To  explore  the  seasonality  of  the  error  and  fit  pa¬ 
rameters  over  the  annual  cycle,  we  choose  two  regions  in 
the  northwest  Pacific  Ocean.  The  high  variability  (HV) 
region  bounded  by  20°-50°N  and  120°E-160°W  has  high 


SSHA  variability  because  of  the  meandering  of  the 
Kuroshio  Current.  The  low  variability  (LV)  region 
bounded  by  0°-20°N  and  130°E-150°W  has  relatively 
low  SSHA  variability. 

In  the  HV  region,  the  fit  error  e  is  driven  by  the  depth 
of  TDa  such  that  e  is  larger  when  TDa  is  deeper  in  the 
months  of  December,  January,  and  March  (cf.  solid  lines 
of  Figs.  8a,c).  The  percent  error,  which  is  the  percent  of 
the  error  relative  to  the  observed  TDa  given  by 
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Fig.  7.  The  correlation  coefficient  of  observed  vs  synthetic  TDa 
for  each  of  the  cases  in  Table  1  as  listed  on  the  x  axis.  The  corre¬ 
lation  is  computed  on  superdata  (black)  and  point-for-point  (gray) 
values,  and  the  corresponding  vertical  lines  represent  the  100  draw 
bootstrap  standard  error  estimate. 


is  related  to  the  rate  of  change  fit  parameter  ai.  When 
the  magnitude  of  is  large  the  magnitude  of  the  percent 
error  is  also  large  (cf.  solid  black  lines  of  Figs.  8b, d).  The 
percent  error  tends  to  be  negative  because  the  synthetics 
have  a  shallow  bias. 

For  the  low  variability  region,  which  is  represented  by 
the  dashed-dotted  lines  in  Fig.  8,  the  effects  of  season¬ 
ality  are  relatively  weak  for  Aq,  Aj,  and  e,  which  change 
little  throughout  the  year.  In  Figs.  8d,e,  percent  error 
and  correlation  coefficient  for  both  HV  and  LV  regions 
are  considered  for  both  the  A3EGT  synthetics  (in  black) 
and  CLIM  (in  gray)  cases.  In  Figs.  8d,e,  the  gray  and 
black  dashed-dotted  lines  are  relatively  close  together 
compared  to  the  solid  lines.  This  suggests  that  the 
A3EGT  and  CLIM  cases  have  similar  errors  in  the  LV 
region.  In  the  HV  regions,  however,  the  CLIM  case  has 
larger  magnitude  in  Fig.  8d  and  lower  correlation  on 
average  in  Fig.  8e.  The  average  correlation  coefficient 
in  the  HV  region  for  Aq  determined  from  the  A3EGT 
synthetics  is  0.93,  higher  than  0.83  for  climatology.  Av¬ 
erage  correlation  is  also  relatively  low  in  the  LV  region, 
0.84  for  both  A3EGT  and  CLIM.  In  the  LV  region,  the 
accuracy  of  the  A3EGT  synthetics  is  not  significantly 
different  from  climatology.  The  value  added  by  altime¬ 
try  in  the  MOD  AS  synthetics  is  evident  only  in  the  re¬ 
gion  of  high  SSH  variability,  where  the  signal  stands  out 
from  the  background  noise. 

The  global  distribution  of  observed  aq  values  can  be 
seen  in  Fig.  9.  For  the  60-day  period  starting  1  February, 
the  northwest  Pacific  Ocean  Kuroshio  Extension  region 


(the  HV  region  in  the  solid  box  of  Fig.  9)  has  relatively 
deep  TDa  in  the  annual  cycle.  In  contrast,  the  60-day 
period  starting  in  August  has  relatively  shallow  TDa 
in  the  Kuroshio  Extension  region  (see  also  Fig.  8).  The 
seasonal  cycle  is  much  weaker  in  the  LV  region  (dashed- 
dotted  box  in  Fig.  9). 

The  rate  of  change  of  TDa  is  given  by  the  fit  parameter 
Ai,  as  shown  in  Fig.  10  globally  for  the  60-day  periods 
starting  in  March  and  November.  In  the  Kuroshio  Ex¬ 
tension  (HV)  region,  ai  is  shoaling  most  rapidly  in  March 
and  April  and  deepening  in  November  and  December 
(Fig.  8).  This  pattern  for  the  annual  cycle  of  TDa  rate  of 
change  applies  generally  to  most  regions  in  the  mid¬ 
latitude  Northern  Hemisphere. 

The  annual  cycle  of  the  percent  error  in  A3EGT 
synthetics  is  linked  to  the  TDa  rate  of  change.  Wherever 
the  magnitude  of  a\  is  large,  the  percent  error  magnitude 
is  large.  For  the  Kuroshio  Extension  region,  the  60-day 
periods  starting  in  April  and  December  have  relatively 
large  percent  error  (Fig.  8);  the  global  distributions  of 
errors  for  these  months  are  shown  in  Fig.  11.  The  largest 
errors  tend  to  be  where  SSHA  variability  is  large:  for 
example,  in  the  eddy  shedding  regions  of  the  Kuroshio 
and  the  Gulf  Stream.  In  regions  of  the  ocean  where  the 
SSHA  variability  is  weak,  such  as  the  LV  region,  the 
errors  are  relatively  small. 

6.  Summary  and  conclusions 

The  impact  of  the  number  of  satellite  altimeters  on 
upper-ocean  predictions  of  thermocline  depth  (TD)  are 
evaluated  using  global,  data  assimilating,  layered  model 
SSHA  analyses.  By  varying  the  number  of  altimeters 
from  zero  to  three,  the  prediction  accuracy  is  determined 
relative  to  a  global  set  of  in  situ  profile  observations.  In 
addition,  methods  for  evaluating  prediction  accuracy  are 
presented  using  two  comparison  frameworks  and  two 
statistical  methodologies.  Comparisons  are  made  point 
for  point  and  for  binned  and  fitted  superdata  regions 
(1°  radius,  60  days).  Both  traditional  Gaussian  and  non- 
parametric  statistical  methodologies  are  applied. 

The  general  results  show  that  accuracy  provided  by 
the  satellite  altimeter  datasets  is  greater  in  regions  of 
high  SSHA  variability,  and  significant  error  reduction  is 
achieved  with  the  addition  of  at  least  one  satellite  al¬ 
timeter  dataset.  Under  the  limitations  of  the  analysis 
methods,  additional  error  reduction  when  assimilating 
data  from  three  altimeters  versus  one  is  significant  only 
in  bias.  These  conclusions  are  drawn  from  both  the 
point-for-point  and  superdata  frameworks  when  using 
nonparametric  statistical  methods.  The  lack  of  signifi¬ 
cant  skill  improvement  between  two  and  three  satellite 
altimeter  datasets  may  be  a  consequence  of  the  length 
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Fig.  8.  The  median  fit  parameters  (a)  ao,  (b)  ai,  and  (c)  e  and  the  (d)  percent  error  and 
(e)  correlation  of  «o  from  A3EGT  synthetics  (in  black)  and  CLIM  (in  gray)  by  month,  relative 
to  the  fit  parameters  determined  directly  from  observations.  The  solid  lines  are  for  the  region 
bounded  by  20°-50°N  and  120°E-160°W  and  the  dashed-dotted  lines  are  for  the  region 
bounded  by  0°-20°N  and  130°E-150°W  in  the  northwest  Pacific  Ocean. 


and  time  scales  associated  with  the  NLOM  assimilation 
and  the  smoothing  associated  with  gridded  climatolog¬ 
ical  coefficients  of  MOD  AS.  As  a  result,  this  system  is 
unable  to  take  full  advantage  of  the  added  spatial  detail 
provided  by  multiple  altimeters,  even  though  each  ad¬ 
ditional  data  stream  adds  approximately  35  000  data 
points  globally  per  day.  The  simulated  error  experiment 
of  Smedstad  et  al.  (2003)  also  indicates  that  the  marginal 
reduction  in  SSHA  error  decreases  for  each  additional 
satellite  data  stream. 


As  expected,  the  BEST  synthetics  have  the  greatest 
accuracy  but  are  unavailable  in  practical  applications 
because  the  observed  profiles  are  used  for  their  deriva¬ 
tion.  The  BEST  SSHA  estimate  is  fully  consistent  with 
the  observed  profile  within  the  context  of  MODAS  syn¬ 
thetics.  This  upper  accuracy  limits  of  MB'  and  RMSE' 
(bias  and  rms  error  scaled  by  the  observations  standard 
deviation)  are  approximately  —0.05  and  0.25,  respec¬ 
tively,  for  nonparametric  statistics  and  the  superdata 
methodology.  The  next  greatest  MB'  accuracy  is  found 
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Fig.  9.  The  fit  parameter  Ou  (superdata  mean;  in  m)  for  the  observed  TD^  for  the  global  ocean  for  a  60-day  analysis 
window  starting  (a)  1  Feb  and  (b)  1  Aug  for  the  years  2001-03.  The  rectangles  represent  the  regions  for  the  com¬ 
putations  in  Fig.  8. 


in  the  results  from  CLIM  and  TONLY.  This  somewhat 
surprising  result  is  due  to  climatologies  being  specifically 
designed  to  globally  minimize  MB'  and  RMSE'.  The 
correlation  levels  of  CLIM  and  TONLY,  however,  are 
lower  than  the  altimeter  assimilative  MODAS  cases. 
The  TONLY  estimates  are  nearly  the  same  as  CLIM, 
suggesting  that  SST  alone  has  little  influence  on  TD. 

In  terms  of  RMSE',  all  of  the  synthetics,  with  the 
exception  of  BEST  and  AO,  have  nearly  the  same  value. 
The  differences  in  RMSE'  of  these  synthetics  are  not 


statistically  significant.  The  lowest  accuracy  in  both  MB' 
and  RMSE'  comes  from  the  zero-altimeter  synthetic  AO. 
This  is  because,  without  SSHA  assimilation,  NLOM  is 
not  constrained  at  all  by  SSHA.  These  lower  accuracy 
limits  of  MB'  and  RMSE'  are  approximately  —0.2  and 
0.45,  respectively,  for  nonparametric  statistics  and  the 
superdata  methodology. 

In  this  paper,  the  statistical  methodologies  are  found 
to  have  a  relatively  large  impact  on  the  results.  Evalua¬ 
tion  based  on  traditional  statistics  that  assume  Gaussian 
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Fig.  10.  The  fit  parameter  (superdata  rate  of  change;  in  m  day^')  for  the  observed  TDa  for  the  global  ocean  for 
a  60-day  analysis  window  starting  (a)  1  Mar  and  (b)  1  Nov  for  the  years  2001-03. 


data  distributions  result  in  inflated  error  and  error 
uncertainty  estimates.  Using  nonparametric  statistics 
circumvents  the  need  for  strict  and  often  unfounded 
assumptions  about  the  data  distribution.  The  two  com¬ 
parison  frameworks  used  in  this  analysis  have  a  lesser 
impact.  The  effect  of  the  superdata  methods  is  to  even 
the  influence  of  data  between  low-  and  high-density 
observation  regions.  Because  the  global  error  results  for 
the  point-for-point  framework  are  slightly  larger  than 
for  the  superdata  framework,  high-density  observation 
regions  tend  to  occur  in  areas  with  larger  errors. 


The  errors  associated  with  SSHA  (Fig.  5)  are  gener¬ 
ally  larger  than  the  errors  found  for  TD  (Fig.  6).  This  is 
due  to  representation  errors  associated  with  the  SSHA 
comparison,  because  there  are  no  direct  in  situ  estimates 
of  SSHA.  Instead,  we  use  the  difference  in  observed 
dynamic  height  from  climatology.  Climatology,  how¬ 
ever,  is  not  identical  to  the  mean  sea  surface  as  measured 
from  satellites  or  associated  with  NLOM.  These  mis¬ 
representations  add  to  the  errors  found  in  this  analysis. 
Because  MOD  AS  is  designed  to  provide  realistic  three- 
dimensional  temperature  and  salinity  structure,  the  TD 
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Fig.  11.  The  percent  error  for  TDa  A3EGT  fit  parameter  ao  (superdata  mean)  for  the  global  ocean  for  a  60-day 
analysis  window  starting  (a)  1  Apr  and  (b)  1  Dec  for  the  years  2001-03. 


estimates  have  greater  skill  relative  to  direct  in  situ 
profiles  observations. 

The  metric  TDy  has  poorer  accuracy  than  TD^  because 
ILDv  introduces  a  shallow  bias  in  the  ILD  estimates.  The 
ILDv  methodology  is  based  on  the  first  curvature  peak 
found  in  the  profile  and  therefore  returns  the  depth  of 
the  purely  isothermal  layer  near  the  surface.  This  exac¬ 
erbates  the  already  shallow  bias  of  MODAS,  further 
degrading  the  synthetic  accuracy.  Synthetics  and  other 
predictions  tend  to  smooth  the  sharp  gradients  found  in 
observations,  further  reducing  the  accuracy  of  both  TDy 


and  TDa.  The  more  unbiased  threshold  ILD-based  es¬ 
timate  TDa  does  not  reinforce  the  shallow  bias  of 
MODAS  and  leads  to  more  accurate  TD^  estimates. 

Using  summary  diagrams,  the  errors  scaled  by  the 
observation  standard  deviation  are  plotted  side  by  side 
for  consistent  comparison.  These  diagrams  are  repeated 
for  each  statistical  methodology  and  each  comparison 
framework.  In  each  rendition,  the  results  are  similar, 
with  the  exception  that  Gaussian  statistics  inflate  the 
error  and  error  uncertainty  levels.  The  nonparamet- 
ric  statistical  methods  indicate  lower  error  levels  and 
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larger  separation  between,  for  example,  AIJ  and  A3EGT 
synthetics,  where  the  standard  error  bars  do  not  overlap 
(Fig.  6).  The  traditional  Gaussian  statistical  methods  are 
unable  to  differentiate  these  cases  because  this  separation 
is  smaller  and  the  standard  error  bars  overlap. 

The  result  that  the  AIG  SSHA  estimates  tend  to  have 
better  accuracy  than  the  AIJ  SSHA  estimates  is  con¬ 
sistent  with  expected  error  estimates  of  Jacobs  et  al. 
(2002).  The  distance  between  ground  tracks  in  the  GFO 
orbit  is  about  125  km  at  35°N  with  a  17.05-day  repeat 
cycle,  whereas  the  distance  between  ground  tracks  in  the 
Jason-1  orbit  is  about  260  km  at  35°N,  with  a  9.95-day 
repeat  cycle.  The  characteristics  of  the  mesoscale  tend  to 
have  small  space  and  long  time  scales  relative  to  the 
altimeter  orbit  spacing  and  repeat  cycle,  respectively 
(Jacobs  et  al.  2001).  This  suggests  that  the  shorter  orbit 
spacing  should  be  an  advantage  for  the  AIG  SSHA  esti¬ 
mate,  a  hypothesis  corroborated  in  the  results  of  this  paper. 

Comparing  predictions  to  in  situ  observations  is  the 
ultimate  test  in  that  the  observations  are  the  closest  in¬ 
formation  we  have  to  the  true  ocean  condition.  This  type 
of  comparison,  however,  is  inherently  plagued  with 
representation  errors,  because  the  observations  gener¬ 
ally  have  more  physical  processes  influencing  the  mea¬ 
surements  on  shorter  space  and  time  scales  than  the 
predictions  are  able  to  resolve. 
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APPENDIX 


Robust  Nonparametric  Statistics 

To  compute  the  biweight  mean,  first  the  median  M 
and  the  median  absolute  deviation  (MAD)  are  com¬ 
puted  (e.g.,  Hoaglin  et  al.  1983).  Then,  the  weights  u,  for 
each  of  N  data  values  are  computed  such  that 


x.-M 

cMAD’ 


for  /=  1,2,3,  ...  ,N. 


(Al) 


The  recommended  value  for  the  parameter  c  is  7.5, 
which  corresponds  to  censoring  outliers  beyond  at  least 
five  standard  deviations  (Lanzante  1996).  The  censoring 
is  accomplished  by  using  the  condition  that  Uj  =  1.0  for 
any  j  where  \uj\  >  1  (Hoaglin  et  al.  1983).  The  biweight 
estimate  of  the  mean  is  then 


(  N 

I(x  -M)(1-«2)2 
/=!  ' 


X  (1  -  ii)f 


y=i 


^  (A2) 


It  is  important  to  note  that  this  weighting  factor  has 
a  larger  magnitude  for  larger  deviations  of  x  from  M. 
The  censoring  is  accomplished  by  setting  iij  =  1  and 
therefore  1  —  m?  =  0  for  deviations  from  the  median  M 
defined  by  the  censor  values  c.  Also  note  that  there  is 
a  small  typographical  error  in  Lanzante  (1996)  and 
that  the  correction  can  be  found  online  (at  http://www. 
gfdl.noaa.gov/~jrl/jrl_webpages/manuscripts/resistant/ 
IMPORTANT). 

The  biweight  standard  deviation  is  computed  as 


N 

aZ(x.-M)2(1-k2)4 

i=l  ' 

-|l/2 

N 

S(1  -  uj)(l  -  5uj) 
7=1 

(A3) 


Because  Eq.  (A3)  is  an  unbiased  estimate  of  the  scale 
of  X,  this  corresponds  to  the  unbiased  measure  of  the 
standard  deviation. 
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